Molecular characterization and genetic diversity analysis in Indian mustard (Brassica juncea L. Czern & Coss.) varieties using SSR markers

In this study, we evaluated genetic diversity in a panel of 87 Indian mustard varieties using 200 genomic-SSR markers. A total of 189 SSRs resulted into positive amplification with 174 (92.06%) SSRs generating polymorphic products and 15 (7.94%) SSRs producing monomorphic amplicons. A total of 552 alleles were obtained and allele number varied from 2–6 with an average number of 3.17 alleles per SSR marker. The major allele frequency ranged from 0.29 (ENA23) to 0.92 (BrgMS841) with an average value of 0.58 per SSR locus. The polymorphic information content (PIC) value ranged from 0.10 (BrgMS841) to 0.68 (BrgMS519) with 0.39 as mean PIC value. The gene diversity per locus ranged from 0.13 (BrgMS841) to 0.72 (ENA23 & BrgMS519) with a mean value of 0.48 per SSR primer pair. Both Unweighted Neighbor Joining-based dendrogram and population structure analysis divided all the 87 varieties into two major groups/subpopulations. Analysis of molecular variance (AMOVA) inferred the presence of more genetic variation (98%) among individuals than among groups (2%). A total of 31 SSRs produced 36 unique alleles for 27 varieties which will serve as unique DNA-fingerprints for the identification and legal protection of these varieties. Further, the results obtained provided a deeper insight into the genetic structure of Indian mustard varieties in India and will assist in formulating future breeding strategies aimed at Indian mustard genetic improvement.


Introduction
Indian mustard (Brassica juncea L. Czern & Coss.) is an economically important oilseed crop of Rapeseed-Mustard group, belonging to family Brassicaceae with a physical genome size of 922 Mb [1]. It is an amphidiploid crop (AABB, 2n = 36), which evolved by natural hybridization between two primary diploids-B. rapa (AA, 2n = 20) and B. nigra (BB, 2n = 18), followed by subsequent chromosomal duplication in nature [2]. Presently, it is being cultivated in Canada, some European countries, Russia, Australia, China, Pakistan, India and Bangladesh. In a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 SSR markers [16] for genetic diversity and population structure analysis and developing molecular tags/DNA fingerprints for Indian mustard cultivars.

Plant material
Selfed, pure seeds of eighty-seven Indian mustard varieties were taken from DUS division of ICAR-DRMR, Bharatpur, Rajasthan, which comprised of the plant material in this study. The detailed information of these varieties along with their developing center, pedigree and release year is given in Table 1. SSR-genotyping work had been carried out in Molecular Biology Laboratory of ICAR-DRMR, Bharatpur, Rajasthan, India.

DNA extraction and purification
DNA from pooled fresh and young leaves of five plants per genotype was extracted and purified using the already high stringency protocol in our laboratory [17]. The concentration of purified DNA was examined by running on 0.8% agarose gel electrophoresis along with a lambda DNA ladder.

SSR primers and polymerase chain reaction (PCR) amplification
A panel of 200 genomic-SSRs that covers all eighteen linkage groups of Indian mustard [16] were chosen for genotyping of Indian mustard varieties. Polymerase chain reactions were run following the high stringency protocol already standardized in our laboratory [18]. PCR amplicons were resolved on 3.5% Super Fine Resolution (SFR) agarose (Amresco, USA) gel along with 50 bp DNA ladder as a benchmark on both sides of the gel and analyzed in a gel documentation system (Syngene, UK).

Data analysis
PCR amplicons of different sizes were considered as different alleles. An allelic size data matrix was prepared and subjected to PowerMarker v.3.25 [19] for calculation of major allele frequency (MAF), polymorphism information content (PIC) value and gene diversity. Variety wise allelic composition of SSR markers was prepared from the score sheet and unique alleles (particular allele appearing only in one variety) were identified as DNA fingerprint for a particular variety. UNJ-dendrogram was constructed using Darwin 5.0 software [20] to decipher genetic relationship among different varieties used.

Population structure analysis and AMOVA
Analysis of population structure of Indian mustard cultivars was carried out by STRUCTURE v2.3.4 software using admixture model [21]. For reach value of K (from 1-9), five independent runs were carried out with a burn-in period of 1,000,000 followed by 100,000 Markov Chain Monte Carlo (MCMC) replications. STRUCTURE HARVESTER v.6.92 [22] was used to determine the optimum K value by log probability of data [ln P(D)]. Indian mustard genotypes were classified into two classes; pure-the genotypes with �80% affiliation probabilities and admixture-with �80% affiliation probabilities. GenAlEx6.5 software [23] was used for analysis of molecular variance (AMOVA).

Allelic diversity and genetic inter-relationship analysis
Out of 200 SSRs evaluated, 189 SSR markers produced clear and scorable bands, while remaining 11 exhibited no amplification at all. Among the amplified SSR markers, 174 (92.06%) SSRs amplified polymorphic products, whereas 15 (7.94%) SSRs resulted into monomorphic amplicons. Various allelic diversity parameters of SSR markers used in this study including number of alleles, major allele frequency (MAF), PIC value and gene diversity are presented in Table 2. A total of 552 alleles were obtained and the allele number ranged from 2-6 with an average number of 3.17 alleles per SSR locus. The major allele frequency varied from 0.29 (ENA23) to 0.92 (BrgMS841) with a mean value of 0.58 per SSR marker. PIC value defines the discriminatory power of a marker and is the representative of allelic diversity and frequency among the genotypes. The PIC value was in the range of 0.10 (BrgMS841) to 0.68 (BrgMS519) with 0.39 as mean PIC value. The gene diversity per locus ranged from 0.13 (BrgMS841) to 0.72 (ENA23 & BrgMS519) with a mean value of 0.48 per SSR primer pair. SSR marker polymorphism level is generally measured in terms of PIC values. The discriminatory power of SSR marker can be defined as high for PIC values >0.50, moderate for PIC values in the range of 0.25 to 0.50 and low for PIC values <0.25 [24]. In the present study, 19 (10.92%) SSR markers were highly polymorphic and informative having PIC values >0.50, 154 (88.51%) SSRs were moderately polymorphic (PIC values in the range of 0.25-0.50) and one (0.57%) SSR marker exhibited low degree of informativeness and polymorphic potential (with PIC values <0. 25). A total of 44 SSR markers resulted into PIC values greater than the mean PIC value, which infers that these SSRs can be used for various trait mapping studies in B. juncea. UNJ-based grouping method using euclidean distance matrix based upon SSR allelic data grouped all the 87 varieties into two major clusters (Fig 1). Fifty-seven varieties were grouped into cluster I and remaining 30 were grouped into cluster II.

Population structure analysis and AMOVA
Population structure analysis enhances the understanding about genetic relationship among various genotypes and also provides the basis for association mapping studies. Till now, no effort has been directed to study the population structure of Indian mustard cultivars. In the present study, the results based on K-value of 1-9 exhibited a sharp peak of delta K at K = 2 (Fig 2) inferring the presence of two subpopulations and all further interpretations were   carried out according to this K value. The allelic variation patterns in the bar diagram (Fig 3) inferred the presence of large-scale admixtures which means such genotypes have mixed parentage belonging to dissimilar gene pools. Based on the probability criterion of membership of �0.80 for any cultivar to be pure, 61 genotypes were assigned to SP1, out of which 44 (50.57%) cultivars were pure lines, while 17 (19.54%) were admixtures; while in SP2, from a total of 26 genotypes, 21 (24.14%) were pure lines and 5 (5.75%) genotypes were of admixture type. The presence of admixtures indicates that natural outcrossing and cross-hybridization had been practiced in the past to develop these varieties. The subpopulations as obtained by population structure analysis were subjected to analysis of molecular variance (AMOVA) to quantify the percentage of variation among and within subpopulations. AMOVA explained that most of the genetic variation of this species resides among the cultivars which accounted for 98% of the total variation, while remaining 2% of the genetic variation was attributed to the between populations genetic variation (Fig 4, Table 3). We can draw the inference that the main genetic variations have originated from differences among the individuals and not from the different groups.

Unique alleles/DNA fingerprint development
In the present study, SSR allele size matrix was analyzed to identify unique genotype-specific alleles (particular allele appearing only in one variety) to demarcate SSR-based DNA fingerprint for a particular variety. A total of 36 unique alleles were reported in the dataset with 31 SSR markers to distinguish 27 Indian mustard varieties (Table 4).

Discussion
Evaluation of genetic diversity and population structure analysis has many positive implications for genetic resource conservation and for developing an effective breeding program. The potential of identifying a superior genotype increases with the proper estimation of genetic  diversity in a given population set. In the present study, 200 SSR markers distributed over all the eighteen chromosomes of Indian mustard were used for genotyping of 87 Indian mustard varieties. The advantage of using molecular markers distributed throughout the entire genome of a crop ensures that equal chances of representation is given to all the regions of the genome, thus avoiding inaccurate estimates of the genetic similarities/dissimilarities among the individuals [25]. A total of 174 SSR primer pairs amplified polymorphic products with 3.17 average allele number and 0.39 as mean PIC value. In a similar study, a lesser average number of alleles (2.37) and lower mean PIC value (0.32) than the present study when they analyzed genetic diversity in 23 Indian mustard genotypes using 16 SSR markers [6]. In another study, a lower average PIC value (0.32) was also reported when they genotyped 165 inbred lines of B. oleracea var. botrytis using 43 SSRs, inferring presence of narrow genetic diversity in the genotype panel [26]. Genetic diversity parameters depend upon the origin of genotypes under study, whether they are inter-related or not and the type of molecular marker used. On the contrary, higher average number of alleles (3.57) and average PIC value (0.48) per SSR locus were reported when 95 germplasm accessions of B. juncea were characterized using 44 SSR markers in a similar study [27]. A lower mean PIC value and gene diversity value obtained in the present study indicated the presence of lower level of genetic diversity among the Indian mustard varieties.  UNJ-dendrogram divided all the 87 Indian mustard varieties into two major clusters. Structure analysis also grouped the present set of varieties into two subpopulations, which is in concurrence with neighbor joining grouping method. Though few mismatches in varieties belonging to subpopulation I and cluster I and also in varieties belonging to subpopulation II and cluster II were observed, notably varieties Prakash, RH 30, Vasundhara, Rohini, NRCHB 101, RCC 4 and HYT 33 of subpopulation I were grouped into cluster II, similarly varieties Sitara Singar and PBR 210 of subpopulation II were grouped into cluster I. We observed that out of nine varieties expressing mismatches in two grouping methods, eight had an admixture ranging between 30-50% from other subpopulation, which may be one of the reasons of such mismatches. In an earlier investigation, 31 Indian mustard varieties had been grouped into five different clusters on the basis of multivariate analysis following Euclidean distance and UPGMA method [7]. In a similar study, population structure analysis had been carried out to determine the extent of genetic variation among 58 leafy mustard (B. juncea var. rugosa) germplasm lines using 159 SSRs [10], which classified them into four subpopulations. Population structure had also been determined in 67 B. carinata (Ethiopian mustard) germplasm lines using SSR markers and three subpopulations were obtained [12].
Selection of diverse parents in a hybridization programme is the key for creating more transgressive segregants in early generations and providing better scope for selection of desired recombinants. Clustering helps in grouping genotypes into diverse groups, but it did not speak about genetic makeup of genotype. Structure analysis reveals the extent of admixture from other subpopulations, hence provides right information for selection of parents from diverse groups. Parents for hybridization should be representative of diverse gene pools and at the same time should have minimum admixture from other subpopulations. Varieties Basanti, Sitara Singar, Pusa Karishma, RLM 619 of cluster II and Varuna, Urvashi, Pusa Mustard 25 (NPJ 112) and PBR 210 of cluster I expressed more distinctness from other varieties of their respective group, as depicted by more edge length (Fig 1). These representative varieties from each cluster may be suitable for hybridization programme to generate superior transgressive Pedigree, geographical diversity and trait advantage have been the criterion for selection of parents for hybridization in applied breeding since the historical times, but with the advent of molecular markers, diversity at molecular level has also been given more consideration. Reasons for narrow genetic base of B. juncea lie in the fact of its restricted geographical diversity. Cultivation of this species remained restricted largely to northern states of India, though it has shown promise in Australia and Canada. Another reason is that its large-scale cultivation in India is of recent origin. Though, as a species, B. juncea is supposed to be in existence for about 2500 years, however large-scale cultivation started only 100 years back when B. juncea due to its inherent tolerance against abiotic and biotic stresses, replaced that time prevalent B. campestris var brown sarson [15]. B. juncea (AABB) being amphidiploids expressed better tolerance to prevalent diseases and insect-pests. Genetic diversity evaluation had been carried out in B. juncea varieties on morphological basis and distinctness in varieties developed in eastern and western states than that of the northern states was obtained. We in the present study compared the diversity at molecular level among varieties developed in different states. Present set of 87 varieties was developed at 25 different centres/locations. We observed diversity in varieties of same centre which have developed more than two varieties, as witnessed by grouping into different subpopulations of varieties developed at the same centre. Pedigree-wise analysis of varieties revealed that Varuna was used as a parent in 21  Among the present set of varieties, most similar varieties on the basis of SSR marker variation were Patan Mustard 67 and PCR 7; Ashirwad and Aravali; RH 406 and RH 749; Krishna and Kranti. Both the varieties PM 67 and PCR 7 were derived from germplasm collected from Gujarat, hence are expected to share common gene pool. Ashirwad and Aravali had Krishna as common parent in their pedigree. RH 406 and RH 749 both were developed at same centre, though they did not share common parents in their pedigree, however, sharing of common gene pool from germplasm is expected. Krishna being a direct selection from Kranti, hence remained as a closely similar variety. Out of 21 varieties having Varuna in their pedigree, six varieties; Vardan, Vaibhav, RL 1359, Pusa Jaikisan, Urvashi and JM 2 remained in subpopulation II, while 15 varieties acquired sufficient variation to move to subpopulation I. It was observed in structure analysis that varieties having acquired admixture from other subpopulation turned to be more distinct from other varieties of the same subpopulation. Varieties; Basanti, RLM 619, Sitara Singar, Urvashi and Pusa Karishma having 30-50% admixture expressed more distinctness from other varieties of the same group. However, the other varieties with similar extent of admixture remained similar to the other varieties of the group, so it may be inferred that distinctness depends upon the alleles introgressed and not on the extent of admixture. Thus, the earlier hypothesis that the repeated use of few widely adapted cultivars as parents in hybridization lead to narrow genetic base and yield stagnation in Indian mustard [15] has been proved in this study at molecular level using informative SSR markers.
In the present study, AMOVA results inferred the presence of significant variations among the varieties (98%) than that between subpopulations genetic variation (2%). Higher levels of variation within individuals than among subpopulations were also reported while evaluating population structure of leafy mustard (B. juncea var. rugosa) [10] and Ethiopian mustard (B. carinata) [12], respectively.
To establish the newness of a crop genotype or a variety, DUS (distinctness, uniformity and stability) test involves almost two years testing in field along with the reference varieties, which is quite cumbersome and time consuming [28]. On the other hand, use of molecular markers for this purpose advocates a rapid, robust, less time consuming and more reliable approach for varietal identification. In recent years, fingerprinting of Indian mustard varieties using DNAbased molecular markers is of paramount significance for unambiguous and fast identification of morphologically similar looking varieties which could prevent the disputes of varietal ownership [29]. In the present studies, unique alleles were identified that can serve as DNA fingerprints to distinguish a particular variety from other varieties. A total of 31 SSRs produced 36 unique alleles for 27 Indian mustard varieties, which can be successfully deployed as molecular tags or DNA fingerprints for quick identification of these varieties.

Conclusion
The present study constitutes the first attempt to develop understanding about the genetic variability and development of unique DNA-fingerprints for Indian mustard varieties using SSR markers. Both cluster analysis and population structure analysis divided all the cultivars into two major groups. SSR marker variation and pedigree analysis of released varieties expressed narrow genetic base. Further, we suggest interspecific hybridization, resynthesis of B. juncea, de novo derivation of B. juncea from hybridization between nonparental amphidiploids, mutation breeding and population improvement methods for broadening the genetic base of B. juncea varieties. The results of the present study can be useful in formulating breeding programs of Indian mustard as they can assist in identification of genetically diverse genotypes to be used as parents for genetic improvement of this crop. The polymorphic SSRs identified in this study would facilitate marker-assisted breeding, QTL(s) and gene mapping studies through linkage analysis and association mapping studies in Indian mustard. Further, 36 unique DNA fingerprints have been developed for 27 Indian mustard varieties, which can be used for registration under Plant Varieties and Farmers' Rights Act-2001 for obtaining plant varietal protection and also resolving disputes in seed certification. However, due to the narrow genetic base or genetic composition similarities of Indian mustard varieties, it has been concluded that more comprehensive set of SSR markers is required to characterize these varieties to develop unique DNA fingerprints for all of them.